Nathan Day
natedayta.com
2/4/2018
Nathan Day
natedayta.com
library(sf) %>% tidyverse for spatial data explorationlibrary(spdep) and glm()library(geojsonio) # get ODP data library(sf) # the new spatial kid library(spdep) # the spatial grandaddy library(broom) # extract model info easy library(magrittr) # %<>% life library(tidyverse) # duh
Top 3 address for drug crime and not drug crime
arrange(crime_counts, -n) %>% group_by(drug_flag) %>% slice(1:3)
## # A tibble: 6 x 3 ## # Groups: drug_flag [2] ## address drug_flag n ## <chr> <chr> <int> ## 1 600 E MARKET ST Charlottesville VA drugs 410 ## 2 400 GARRETT ST Charlottesville VA drugs 38 ## 3 700 PROSPECT AVE Charlottesville VA drugs 38 ## 4 600 E MARKET ST Charlottesville VA not_drugs 635 ## 5 700 PROSPECT AVE Charlottesville VA not_drugs 412 ## 6 1100 5TH ST SW Charlottesville VA not_drugs 341
The police station's address is 606 E Market Street….
"The answer is quite simple - when individuals walk in to the police department to file a report the physical address of the department (606 E Market Street) is often used in that initial report if no other known address is available at the time. This is especially true for incidents of found or lost property near the downtown mall where there is no true known incident location. The same is true for any warrant services that result in a police report occurring at the police department." - CPD
station_props <- arrange(crime_counts, -n) %>%
group_by(drug_flag) %>%
add_count(wt = n, name = "nn") %>%
slice(1)
with(station_props, prop.test(n, nn)) %>% tidy
## # A tibble: 1 x 9 ## estimate1 estimate2 statistic p.value parameter conf.low conf.high method ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 0.222 0.0218 2135. 0 1 0.181 0.220 2-sam… ## # … with 1 more variable: alternative <chr>
No, they are not.
Census blocks make a lot of sense because:
library(tidycensus)long_url <- "https://opendata.arcgis.com/datasets/e60c072dbb734454a849d21d3814cc5a_14.geojson"
census <- geojsonio::geojson_read(long_url, what = "sp") %>%
st_as_sf()
ggplot(census, aes(fill = HU_Vacant / Housing_Units)) + # fill with whatever you want
geom_sf() +
scale_fill_viridis_c() # aww yess
crime <- read_csv("https://github.com/NathanCDay/cville_crime/raw/master/crime_geocode.csv")
crime %<>% filter(complete.cases(.))
crime %<>% filter(address != "600 E MARKET ST Charlottesville VA")
sf, with same Coordinate Reference System (critical)crime %<>% st_as_sf(coords = c("lon", "lat"), crs = st_crs(census))
sf::st_within() and friendscrime %<>% mutate(within = st_within(crime, census) %>% as.numeric) %>%
filter(!is.na(within))
There are bunch of other great st_x(sf_a, sf_b) functions too. If you want to do it, there's a tool for it.
crime %<>% mutate(drug_flag = ifelse(grepl("drug", Offense, ignore.case = TRUE),
"drugs", "not_drugs"))
tidyversecrime_block <- st_set_geometry(crime, NULL) %>% # remove geometry for spread() to work
group_by(within, drug_flag) %>%
count() %>%
spread(drug_flag, n) %>%
mutate(frac_drugs = drugs / sum(drugs + not_drugs)) %>%
ungroup() # geom_sf doesn't care for grouped dfs/tbls
census %<>% inner_join(crime_block, by = c("OBJECTID" = "within"))
ggplot(census, aes(fill = frac_drugs)) +
geom_sf() + scale_fill_viridis_c()
Test with Moran's I statistic
## # A tibble: 1 x 5 ## statistic p.value parameter method alternative ## <dbl> <dbl> <dbl> <chr> <chr> ## 1 0.213 0.009 991 Monte-Carlo simulation of Moran I greater
Are there other community metrics that are correlated?
Median income data comes from the American Community Survey via library(tidycensus) to supplement housing and demographics from the original Census data from ODP.
glm() to fit the highly correlated predictors simultaneously.mod <- glm(frac_drugs ~ frac_black + income,
data = census, family = quasibinomial())
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3.455431e+00 2.272112e-01 -15.2080119 9.842326e-17 ## frac_black 1.570012e+00 3.546781e-01 4.4265838 9.388946e-05 ## income -5.241156e-07 2.786602e-06 -0.1880842 8.519288e-01
The proportion of the population that is black is significant, but median income is not.
## # A tibble: 1 x 5 ## statistic p.value parameter method alternative ## <dbl> <dbl> <dbl> <chr> <chr> ## 1 -0.0691 0.65 350 Monte-Carlo simulation of Moran I greater
Does drug enforcement target black communities?
More steps:
Get data about police patrol locations/frquency
Dig deeper on the crime reporting procedure
How many of these "drug" crimes are low-level offenses
Add temporal elements to the model i.e. seasonal, time of day